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1. Introduction 



In this paper, we are concerned with the solution of the equation 



Ku = g 



(1) 



where K : U ^ V \s a, linear and bounded operator mapping between two Hilbert-spaces U 
and V . Equations of type ([l]) are called well-posed if for given g ^ V there exists a unique 
solution v) that depends continuously on the right-hand side g. If one of these conditions 
is not satisfied, the problem is called ill-posed. In the case of ill-posedness, arbitrary small 
deviations in the right hand side g may lead to useless solutions u (if solutions exist). These 
deviations are commonly modelled as random: They are due to indispensable numerical errors 
as well as to the random nature of the measurement process itself. (Statistical) regularisation 
methods aim at computing stable approximations of true solutions u from a (statistically) 
perturbed signal g. 

In this paper we assume that we are given the observation 



where J\f{fi,(j'^) denotes the normal distribuion with expectation fi and variance cr^. The 
white noise model ^ is very common in the theory of statistical inverse problems [see e.g. 
HZl EH [201 [lEl El [221 Ej and it can be regarded as reasonable approximation to models relevant 
for many areas of applications. A statistical regularisation method amounts to compute an 
estimator u = 'u(ct) given the data y in ([2]) such that u^c) — t- (in an appropriate sense) as 
cj ^ 0+ . 

The simplest case covered by Model Q is classical nonparametric regression and its ampli- 
tude of applications. Here, U and V are suitable function spaces where it is assumed that U 
can be continuously embedded into V. U models the smoothness of the true signal and K 
is the embedding operator K : U ^ V (cf. [7]). More sophisticated examples for K arise in 
imaging, when blurring induced by the recording optical systems is modelled as a convolution 
with a kernel k{x — y) (in engineering and physical terminology denoted as a point spread 
function). Beyond convolution, different operators K occur in various other applications, e.g. 
in seismic engineering ([19j), in material sciences ([60j), magnetic resonance imaging ([9j), 
image processing (|S]), tomography ([H^IES]) and astrophysics ([H])- 

Due to the broad area of applications, the literature on statistical regularisation methods is 
vast. We only give a few, selective references: Penalised least-squares estimation (that includes 
Tikohonov-Philipps and maximum entropy regularisation) [6l [551 [SS] , wavelet based methods 
[281 [3Ql [ig HSl [IS], estimation in Hilbert-scales [3 HHiSSl [Ml ISSl [SB] and regularisation by 
projection [171 ig [22l [5llll] to name but a few. 

In this work, we follow a different route and study a variational estimation scheme that 
defines estimators u as solutions of 





and 




inf J{u) subject to 




(3) 



Here, J is a convex regularisation functional that is supposed to measures the regularity of 
candidate estimators u G U and Tjy is a data fidelity term on V that measures the deviation 
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of the data Y and the estimated image Ku. In this work we consider fidelity measures of 
the form 

Tn{v) = max fJ-n{v), for v £ V. (4) 

l<n<A'' 

The functions /U„ : F — )• M are designed to be sensitive to non-random structures in v. We 
will refer to Q as multiresolution statistic (MR-statistic) and to corresponding solutions of 
the optimisation problem ([s]) as statistical multiresolution estimators (SMRE). 

The parameter qN{c() in ([3|) is chosen to be the (1 — a)-quantile of the statistic r/v(e) and 
governs the trade-off between data-fit and regularity. Hence the admissible region 

AN{a) = {ueU : TN{a-\Y - Ku)) <qN{a)} (5) 

constitutes a (1 — a)-confidence region for a solution n of ([s]), i.e. a region which covers the 
true solution with probability 1 — a at least. This gives the estimation procedure ^ a 
precise statistical interpretation: Since for each solution of ([T]) one has G AN{ct) with 
probability at least 1 — a it follows from ([s]) that 



< J{u^)^ > 1 - a. 



Summarizing, regularisation methods of type Q pick among all estimators u for which the 
distance between Ku and the data Y does not exceed the threshold value (?Ar(a) one with 
largest regularity. The probability that this particular estimator is more regular than any 
solution of ([T]) is bounded from below by 1— a . This is in contrast to many other regularisation 
techniques where regularisation parameters merely govern the trade-off between fit-to-data 
and smoothness and do not allow such an interpretation. (In the case of wavelet-thresholding, 
this property was studied in [29] ) 

Whereas most of the literature is concerned with the proper choice of the regularisation 
functional J, in this work we will discuss the issue of the data fidelity term T/v. We claim 
that from a statistical perspective the choice of T/v is of equal importance as the choice of J. 
In Definition 3.1 below we will delimit a class of feasible functions for jji, . . . , in Q. 



However, in order to make ideas clear (and also to justify the notion "multiresolution"), we 
will start with a simple, yet illustrative example: Let G C [0, l]'^ be the equi-spaced grid of 
points in the unit cube and assume that V consists of all real valued functions t; : G — )• M. 
Moreover, let {Si, S2, ■ ■ ■ , Sn} be a sequence of non-empty subsets of G. We define for n G N 
and V & V the local average function fin{v) = \ J2ueV ^^^l / VW^^- Thus, the MR-statistic T^r 



reads as 

1 



TNia-^iY - Ku)) = max 



l<n<N 



Y,^-Hy-Ku), 



(6) 



In other words, the statistic Tjv returns the largest local average of the residuals a^^{Y — Ku) 
over the sets Si, . . . , Sj^. Under the hypothesis that is the true solution of ([T]), we have 
that TN{a~^{Y — Ku^)) = Tjv(e) does not exceed the threshold giv(a) with probability 1 — a 
at least. Recall that e is a white noise process and hence "oscillates around zero" as an effect 
of which the quantile values ^Ar(a) are relatively small due to cancellations in the sums in ([6|. 
If, however, u is wrongly specified the residual Y — Ku contains a non-random signal which 
may happen to be covered by a set Sno ■ As an effect the local average over Sno - and thus also 
the statistic T]y{a~^{Y — Ku)) - becomes relatively large and u lies outside the admissible 
domain of the optimisation problem Q. 
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The choice of the system {Si, . . . , Sn} is subtle, since it should not miss any non-random 
information in the residual, if present. Put differently, it encodes a priori information on where 
one expects to encounter non-random behavior in the residuals of any possible estimator 
u. Thus, T/v would be most sensible against a large variety of signals u, if we employ a 
large number of overlapping sets Sn that cover G. This approach, however, turns ^ 
into an optimisation problem with a huge number of constraints which is hard to tackle 
numerically (this is treated separately in |39j). Besides these numerical difficulties, there 
is also a statistical limitation which will be a major issue to be discussed in this paper: If 
the dictionary {Si, . . . , Sn} is too large (in the sense of a metric entropy), the asymptotic 
distribution of T/v will degenerate and would hence be useless for our purposes. In practical 
situations, a priori knowledge on the true solution of ([T]) can be used in order to design 
dictionaries whose entropy guarantees a non-degenerate limit of Tj^ and in addition allows to 
derive rates of convergence of the SMRE to the true signal. A similar comment applies to the 
choice of the regularisation functional J which models a priori information on the regularity 
of the true solution. 

As a consequence the MR-statistic Tjv plugged in into ^ plays the role of a shape constraint 
and the resulting estimation method is capable of adapting the amount of regularization in 
a locally adaptive manner. Put differently, our approach offers a general methodology to 
localise any global convex regularisation functional in order to obtain spatial adaption. This 
is in contrast to global data fidelity terms such as the widely used squared 2-norm fidelity (or 
any other p-norm, p > 1 for that matter) that do not allow for adaptation to local structures. 
This is illustrated in the following example: 

Example 1.1. Assume that U = V = R"- with n = 1024 and let K : U ^ V he the identity 
operator, i.e. ^ can be rewritten into the simple nonparametric regression model 

Yi = ul + aei, i = l,...,n 

with i.i.d. standard normal random variables. The signal G f7 and the data Y 

according to ^ with a = 0.05 are depicted in Figure [T] 




Figure 1. True signal (left) and data Y (right). 



The signal exhibits kinks, jumps, peaks and smooth portions simultaneously which 
makes estimation a delicate matter. For example, the regularisation functional 



^ n— 1 

= - y~] \uk+i 



(7) 



i=l 
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appears to be well suited to recover at least the smooth parts of the signal, however, with 
a tendency to "smear out" edges, peaks and kinks. In the following we will show how this 
deficiency can be repaired by localising J by means of MR-statistics. To this end we will 
compute SMREs, solutions of ^ that is, with J as in ([T]). 

Before we do so, we start with reconstructing by the usual "global" approach for the 
purpose of comparison: We compute a J-penalized least squares estimator n2, i.e. the solution 
of 

1 n ^ n—l 

E\yi-uif + -yZWi+i-^i\'^ ■ (8) 



mm 

«eM" n ^ — ' ■ n 

1=1 1=1 



Here, the proper selection of smoothness amounts to a proper choice of the parameter A > 0. 
It is instructive to rewrite ([s]) in a slightly different form, such that the relationship to Q 
becomes obvious: To each A > there corresponds a threshold value q = q{X), such that IL2 
is a solution of 

^ n—l ^ n 

E\uk+i-Uk\'^ s.t. -'S^\Yi-ui\'^<q. (9) 



mm 

i=l 1=1 



The first three panels in the upper row of Figure [2] depict solutions U2 for q = 25, 43 and 
50. The choice q = 43 yields the visually best result, however it becomes immediately clear 
that there are under- and oversmoothed parts in the reconstruction. The latter becomes 
undeniably visible in the qq-plot of the residual Y — U2 (lower row) which indicates that 
there is a significant amount of outliers. Note, that less oversmoothing, i.e. fewer outliers 
in the residuals, can only be achieved at the cost of more artefacts in the reconstruction 
(by decreasing q) and vice versa fewer artefacts only by accepting severe oversmoothing (by 
increasing q). This is due to the fact that each residual value Yi — ui (1 < / < n) contributes 
equally to the quadratic fidelity in ([s]) (or likewise in ([9])) independent of its spatial position. 

To overcome this obvious "lack of locality" , we compute solutions of ([s]) where we employ 
the MR-statistic in ^ as fidelity measure. To be more precise, we choose the sets . . . , Sn} 
to consist of all discrete intervals of the type {i, ■ ■ ■ ,j} /n with 1 < i < j < n and j — i < 20 
(i.e. N = 20.290). Put differently, the SMRE msmre is a solution of the convex optimisation 
problem 



■ ,2 1 

mm — > \uk+i — Uk\ s.t. max , , = 

tiSK" n ^-^ i<i<j<n Jj — i + 1 

«=1 j-i<20 



< 



For the computation of ■usmre in the rightmost panel of Figure [2] we set q = gAr(a) = 2.9 
which corresponds to a small value of 1 — a ~ 0.01 in order to avoid oversmoothing. The value 
of a was determined by simluations of the statistic Tjv(e) (see also [H] for an asymptotic 
representation of the distribution of Tj\f{e)). Indeed, the result is visually appealing: The 
kinks, jumps and peaks are strikingly well recovered, both in location and height and the 
smooth parts of the signal exhibit no artefacts. Also the corresponding qq-plot confirms that 
there are hardly any outliers in the residuals Y — usmrEj which indicates that oversmoothing 
is limited to a reasonable amount. Again, this is all the more remarkable as the regularisation 
functional J is known to usually blur edges, peaks and kinks. 

Summarising, it becomes evident that the SMRE approach outperforms the standard 
method that employs the global quadratic fidelity. In particular, this example shows that 
pluggin in the MR-statistic T/v into Q results in an estimation scheme that regularises in 
a locally adaptive manner. Aside to the specific choice ([T]) any other convex regularisation 
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functional J can be "localised" in this way, of course, as for example the squared 2-norm or 
the total variation semi- norm. It has turned out, however, that for the present example ^ is 
preferable since it accounts well for the smooth parts in the signal (no "staircasing" effect). 

We finally remark, that the computation of tisMRE relies on the algorithmic framework for 
SMRE as developed in [39j and will not be discussed here. 

The regularisation scheme ([s]) with the MR-statistic Tjv as in ^ was studied in [27] for the 
specific case of non-parametric regression in one space dimension and the total- variation semi- 
norm as regularisation functional J. In this paper we will show that the general formulation 
m (U reveals the SMRE powerful regularisation method far beyond this situation: It can 
be extended to space dimensions larger than one as well as to inverse problems with general K 
as in ([2]) including deconvolution problems. Furthermore, we present very general consistency 
and convergence rates results for SMRE in the context of statistical inverse problems and 
discuss their impact on particular applications. To our best knowledge, results of this type 
have never been obtained before. It is necessary to assume additional regularity of the true 
solution of ([1]) in order to come up with convergence rates results. In the context of inverse 
problems, this is usually done by formulating so-called source conditions. These determine 
smoothness classes of solutions for ([T]) that guarantee risk bounds and fast convergence of 
the estimator to the true signal. In this work we study the standard source conditions used 
in the framework of Bregman- divergences that yield for each penalty functional J in ([s]) one 
specific smoothness class. As shown in Section [4] this can be considered as a generalization 
of the Sobolev-class of functions with exponent 1/2. The formulation of conditions that give 
optimal convergence rates in a scale of smoothness classes for a general but fixed J to our 
knowledge is still open and will not be treated in this work. 
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This paper is organised as follows. After reviewing some basic definitions from convex 
analysis and the theory of inverse problems in Section [2] we develop in Section Section 3.1 



a general scheme for estimation for the statistical inverse problem ^ based on the convex 
optimisation problem ([s]). In Section 3.2 we then prove consistency and convergence rate 
results in terms of the Bregman-divergence w.r.t. the regularisation functional J. In Section 
[4] we study the performance of the so constructed estimators for various examples, as the 
Gaussian sequence model (Section |4.1[) and linear inverse regression problems (Section |4.2[). 



In Section 4.3 we investigate the particular situation when the regularisation functional J 
is chosen to be the total-variation semi-norm, which has a particular appeal for imaging 
problems. Finally, some examples that illustrate the notions of source-condition and Bregman- 
divergence are given in Appendix|A]and the proofs of the main results as well as some auxiliary 
lemmata are collected in Appendix [Bj 



2. Basic Definitions 

In this section we summarise some relevant definitions and assumptions needed throughout 
the paper. 

Assumption 2.1. (i) U and V denote separable Hilbert spaces. The norms on U and V 
are not further specified, and will be always denoted by \\-\\, since the meaning is clear 
from the context. 

(a) Let J : U ^ M. be a convex functional from U into the extended real numbers M = 
M U {oo}. The domain of J is defined by 

D{J) = {ueU : J{u) / oo}. 

J is called proper if D{J) ^ and J{u) > —oo for all u £ U . Throughout this paper 
J denotes a convex, proper and lower semi- continuous (l.s.c.) functional with dense 
domain D{J). 

(Hi) K : U ^ V is a linear and bounded operator. By iw.{K) = K(U) we denote the range 
of K and by K* :!/—)•[/ the adjoint operator of K. 

In the course of this paper we will frequently make use of tools from convex analysis. For 
a standard reference see [M] . 

• The sub-differential (or generalised derivative) dJ{u) of J at tt is the set of all elements 
p £ U satisfying 

J{v) - J{u) - {p,v - u) >0 for all v £ U. 

The domain D{dJ) of the sub-differential consists of all n G f7 for which dJ{u) ^ 0. 

• We will prove consistency of estimators with respect to the Bregman-divergence. For 
u G D{J) the Bregman-divergence of J between u and v is defined by 

Dj{v,u) = J{v) - J{u) - J'{v){v - u) 

where J'{v){v — u) denotes the directional derivative of J at in direction v — u. The 
directional derivative is defined as 

, J{v + hw) - J{v) 

J {v)(w) = lim ; . 

h->o+ h 

and is well defined for convex functions (possibly with values in [—oo, oo]). 
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• For u £ D{dJ) the Bregman-divergence of J between u and v w.r.t. ^ G dJ{u) is 
defined as 

D^j{v, u) = J{v) - J{u) - itv-u). 
The following basic estimates hold 

< Dj{v,u) < D^j{v,u), for all (, G dJ{u). 

Remark 2.1. Clearly, the Bregman-divergence does not define a (quasi-)metric on U: It is 
non-negative but in general it is neither symmetric nor satisfies the triangle inequality. The 
big advantage, however, of formalising asymptotic results w.r.t. to the Bregman-divergence 
(such as consistency or convergence rates) for estimators defined by a variational scheme of 
type ([3]), is the fact, that the regularising properties of the used penalty functional J are 
incorporated automatically. If, for example, the functional J is slightly more than strictly 
convex, it was shown in [61] that convergence w.r.t. the Bregman-divergence already implies 
convergence in norm. If, however, J fails to be strictly convex (e.g. if it is of linear growth) 
it is in general hard to establish norm- convergence results but convergence results w.r.t. the 
Bregman-divergence, though weaker, may still be at hand. In Examples A.1||A.4 as well as in 



Section 4.3 we compute the Bregman-divergence for some particular choices of J. 

The concept of Bregman-divergence in optimisation was introduced in [TTj and has recently 
attracted much attention e.g. in the inverse problems community [see[Hl ,l^0ll24j or in statistics 
and machine learning |23l [501 EZ] • 

Next, we introduce different classes of solutions for Equation ([T]) discussed in this paper. 

Definition 2.2. (i) Let u G D{J) be a solution of ([T]). Then g is called attainable. 

(ii) An element u G D(J) is called J -minimising solution of ([T]), if n solves ([T]) and 

J{u) =inf{J(u) : Ku = g} . 

(iii) Let g £ V he attainable. An element p £ V is called a source element if there exists a 
J-minimising solution u of ([T]) such that 

K*p G dJ{u). (10) 



Then, we say that u satisfies the source condition (10). 



It is well-known in the theory of inverse problems with deterministic noise [see iSSj that the 



source condition ( 10 ) is sufficient for establishing convergence rates for regularisation methods. 
It can be understood as a regularity condition for J-minimising solutions of Equation Q. Put 
differently, for each regularisation functional J and each operator K, the source condition ( |10| ) 
characterises one particular smoothness-class of solutions for ([T]) for which fast reconstruction 
is guaranteed. We clarify the notions Bregman-divergence and source condition by some 
examples in Appendix [A| 

Under fairly general conditions existence of J minimising solution can be guaranteed. We 
formalise these conditions in the following result, however, we omit the proof since it is 
standard in convex analysis [see 34, Chap. II Prop. 2.1]. 

Proposition 2.3. Let g £ V be attainable and assume that for all c £M the sets 

{u£U : \\Ku\\ + J{u) < c} (11) 
are bounded in U. Then, there exist a J-minimising solution of Q. 
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3. A General Scheme for Estimation 
In this section we construct a family of estimators u for J-minimising solutions (cf. Defi- 



nition 



2.2) of Equation ([T]) from noisy data Y given by the white noise model Q. We define 
the estimators in a variational framework and prove consistency as well as convergence rates 
in terms of the Bregman-divergence w.r.t. J. 

3.1. MR-Statistic and SMR-Estimation. We introduce a class of similarity measures in 
order to determine whether the residuals Y — Ku for a given estimator u G U resemble a 
white noise process or not. To this end we will consider the extreme-value distribution of 
projections of the residuals onto a predefined collection of lines in V. To this end, assume 
that 



$ = 02,... }Cran(K)\{0} 

is a fixed dictionary such that \\4>n\\ < 1 for all n G N. For the sake of simplicity, we will 
frequently make use of the abbreviation (j)"!^ = (fin/ \\4>n\\- 

Definition 3.1. Let {t^ : x (0, 1] — t- M}^gj^ be a sequence of functions that satisfy the 
following conditions 

(i) For all r G (0, 1], the function s i— t- ti\f{s, r) is convex, increasing and Lipschitz-continuous 
with Lipschitz-constants Ltvt- such that L^^ ^ L < oo for all G N and 

> AAr(r) := inf tN(s,r) > -oo. (12) 

(ii) There exist constants ci, C2 > and ctq G (0, 1) such that for all < cr < do 

tN{s,r)>cis + C2tN{(TS,r) for (s, r) G M+ X (0, 1] and iV G N. (13) 
Then, for G N, the mapping T/v : V — t- M defined by 

Tn^v) = max {\{v,(j)l)\ , \\(pn\\) 

l<n<N 

is called a multiresolution statistic (MR- statistic). 



Remark 3.1. Let {tArj^gpj be a sequence of functions satisfying i) and ii) in Definition 3.1 
For a fixed A*" G N the mappings /i„ : 1/ — t- M defined by 

lJin{v)=tN{\{v,(t>l)\,Un\\) 

can be interpreted as the average of the signal v restricted to the subspace spanned by . 
With /i„ as above, the MR-statistic T]^(v) in Definition |3. 1| takes the form Q and hence can 



be considered to measure the maximal local average of v w.r.t. the dictionary . . . , (pN^. 



Definition 3.1 allows for a vast class of MR-statistics and the conditions in (i) and (ii) 
appear rather technical. The following example sheds some light on a special class of MR- 
statistics that later on will be studied in more detail. We note, however, that our general 
setting also applies to more involved statistics, as e.g. introduced in |32| [33]. 

Example 3.2. Assume that {/at : (0, 1] — ). M}^gj^ is a sequence of positive functions and 
define 

tN{s,r) := s- /Ar(r). 
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Then, the assumptions in Definition 3.1 are satisfied; to be more precise, we can set L = 1, 



^Nii") = —fNi^) and ci = 1 — ctq and C2 = 1, where ctq ^ (0)1) is arbitrary but fixed. 



Moreover, for a fixed G N, the average functions /i„ : ^ — t- M in Remark 3.1 read as 

Mv) = \{v,K)\-fNiUn\\). 

For a white noise process e :V ^ L^(J7,2l, P) and G N, consider the random variable 

TN{e)= max tNi\ei^l)\,Un\\)- 

l<n<N 

Then, for a level a £ (0, 1) we denote the (1 — a)-quantile of T/v(e) by qj\f{a), that is, 

QNia) := inf {g e M : P (r^(e) < q) > 1 - a} (14) 

Our key paradigm is that an estimator u for a solution of ([T]) fits the data Y sufficiently 
well, if the statistic T]y{Y — Kit) does not exceed the threshold (7Ar(a) (a G (0, 1) and N £ N 
fixed). Among all those estimators we shall pick the most parsimonious by minimising the 
functional J. 

Definition 3.3. Let N £ N and a G (0, 1). Moreover, assume that T/v is an MR-statistic 
and that Y is given by Then every element UN{a) G U solving the convex optimisation 
problem ([s]) is called a statistical multiresolution estimator (SMRE). 

An SMRE UN{a) depends on the regularisation parameters A^ G N and a G (0, 1) that 
determine the admissible region An (a) in ([s]). In order to guarantee existence of a solution of 
the convex problem in Definition |3.3[ that is existence of an SMRE, it is necessary to impose 
further standard assumptions: 

Assumption 3.4. There exists Nq G N such that for a// c G M the sets 



A{c) = <ueU : max \{Ku,(j)*J\ + J{u) < c 

l<n<No 



are bounded in U . 



Assumption 3.4 guarantees weak compactness of the level sets of the objective functional J 



restricted to the admissible region ^jv(a). We note, that if J is strongly coercive (e.g. when 



J is as in Example A.l or A. 4) then Assumption 3.4 is satisfied without any restrictions on 



the operator K. If J lacks strong coercivity (as it is e.g. the case with the total-variation 



semi-norm studied in Section 4.3) additional properties of K are required in order to meet 
Assumption |3.4[ 



Application of standard arguments from convex optimisation yields 



Proposition 3.5. Assume that Assumption 3.4 holds and let N > Nq and a G (0, 1]. Then, 
an SMRE UAr(a) exists. 



Finally, we note that Assumption 3.4 already implies the requirements in Proposition 2.3 



and consequently existence of J-minimising solutions. 

3.2. Consistency and Convergence Rates. We investigate the asymptotic behaviour of 
U]\r{a) as the noise level o" in ([2]) tends to zero. According to the reasoning following Definition 



3.3, the parameters G N and a G (0,1) can be interpreted as regularisation parameters 
and have to be chosen accordingly: The model parameter N has to be increased in order to 
guarantee a sufficiently accurate approximation of the image space V, whereas the test-level a 
tends to such that the true solution (asymptotically) satisfies the constraints of ([s]) almost 
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surely. We formulate consistency and convergence rate results by means of the Bregman- 
divergence of the SMRE un{o:) and a true solution in terms of almost sure convergence. 

Throughout this section we shall assume that {cTfc}^gj^ is a sequence of positive noise- levels 
in ([2]) such that cjfc — )• 0+ as — )■ 00. Moreover, we assume that {afcl^g^ ^ (0) 1) is ^ sequence 
of significance levels and that Ni^ > Nq is such that 



EOfc < 00 and lim Nf^ = 00. 



(15) 



fe=i 



Theorem 3.6. Assume that Assumptions 2.1 and 3.4 hold. Let further be a J- 



solution of ([T]) where g G span<I> and assume that 



-minimising 



and 



Then, for iik :- 



Ck-.-- 



sup 

feGN 



UNf,\Oik) CLS m 
"uk\\ < 00, 



sup Tn{£) 



inf A N, 

l<n<Nk 

one has 



< 00 



V- log «fc I 



cjfc max 

J{uk) — ^ J{'^'^) '^^^ Dj{u\uk) — >■ a.s. 



as well as 



lim sup max 



< 00 



a.s. 



(16) 



(17) 



(18) 



Theorem 3.6 states that if for a given vanishing sequence of noise levels ak, suitable (in 
the sense of (16)) sequences of regularisation parameters A''^ and Ok can be constructed, then 
the sequences of corresponding SMRE converges to a true J-minimising solution w.r.t. 
the Bregman-divergence. We note that the assumption on the boundedness of MR-statistic 
T]^(e) is crucial and in general non-trivial to show. 

It is well known that without further regularity restrictions on , the speed of convergence 
in (17) can be arbitrarily slow. Source conditions as in Definition 2.2 (iii) are known to 
constitute sufficient regularity conditions with quadratic fidelity (cf. [5^ [3 EI]). In our 
situation, where the fidelity controls the maximum over all residuals, we additionally have to 
assume that the source elements exhibit certain approximation properties: 

Assumption 3.7. There exists a J-minimising solution n"^ of ([T]) that satisfies the source 
condition (10) with source element 'p^ . Moreover, for n,N £ N there exist bn^N G ^ such that 

N 



err AT 



(pt) 



P 



n=l 



N(, 



N 



and 



sup 

N&N 



iVivi < 00. 



(19) 



n=l 



Remark 3.2. i) Assumption 3.7 amounts to say that there exists a J-minimising solution u 



that satisfies the source condition ( 10 ) with a source element that can be approximated 
sufficiently well by the dictionary <I> in use. From ( 10 ) it becomes clear that we can always 
assume that p^ G van{K), such that the first condition in (19) is not very restrictive, in 
fact. 

ii) Good estimates of approximation errors for non-orthogonal dictionaries $ are hard to 
come up with in general. Examples of non-orthogonal dictionaries where such estimates 
are available are wavelet- [25] and curvelet- [E] frames. 
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iii) It is important to note that, given prior information on the true solution u"^ , the condi- 



tions in Assumption 3.7 may indicate whether a given dictionary is well suited for the 
reconstruction of li^ or not. As we will see in Section [Zj a priori information on the 
smoothness of can typically be employed. 

Theorem 3.8. Let the requirements of Theorem \3.6\ be satisfied and assume further that 



Assumption 3.1 holds with g G span<I>. //% := max(^fc, errTv^Cp^)) oo, then 



limsup — < oo and limsup max — — < oo a.s. (20) 

k^oo Vk fc-5>oo l<"<^fc Vk 



Remark 3.3. The convergence rate result in Theorem 3.8 is rather general, in the sense that 



the rate function ry^ in (20 ) has to be determined for each choice of K, J and $ separately. We 



outline a general procedure how this can be do ne in practice: assume that is a J-minimising 



solution of ([T]) that satisfies Assumption 3.7 with a source element . 



(i) The sequence {— infi<„<jv AAr(||(/)„||)} ^rpm is positive according to (12). Hence 



Nk:=mi\NGn : errjv(p^) < -cifc inf Ajvdl^nll) 

l<n<N 

is well-defined and since is non-increasing one has < N^+i and A''^ — )• oo as 

k — )• oo. 

(ii) After setting r]k = — cifc inf i<„<Arj. Aat^ (||(/)n||) it remains to check that the sequence of 
test-levels ak = exp ^— {K,r]k/ (Jk)^^ is summable (for some constant k > 0). 
For the so constructed sequences A^, % and a^, the assertions of Theorem |3.8| hold. 

4. Applications and Examples 

In Section |3] we developed a general method for estimation of J-minimising solutions of 
linear and ill-posed operator equations from noisy data. Our estimation scheme thereby 



employed the MR-statistic T/v (cf. Definition 3.1). In this section we will study particular 



instances of MR-statistics covered by the general theory in Section |3j 

We study the case where T/v constitutes the extreme- value statistic of the coefficients 



w.r.t. an orthonormal dictionary $ (Section 4.1). We show how Assumption 3.7 
in this case reduces to the requirement that the true solution lies in a Sobolev- 
ellipsoid w.r.t. the system $. Moreover, it will turn out that for the case when $ 
denotes the eigensystem of a compact operator, SMR estimation can be considered as 
soft-thresholding. 



In Section 4.2 we skip the assumption of orthonormality and examine general SMR- 
estimation w.r.t. (non-orthonormal) dictionaries that satisfy certain entropy condi- 
tions. In particular, we will consider the case when U = V = and when 
$ consists of indicator functions w.r.t. a redundant system of subcubes in [0, 1]"^ (cf. 
Example |1.1[ ) . 

Finally, we study the case when the penalty functional J is chosen to be the total- 
variation semi-norm on [/ = L2 in Section |4.3[ We shed some light on the meaning 



behind the source-condition ( 10 ) and the Bregman-divergence for total- variation reg- 
ularisation, complementing the examples in Appendix [Xj Additionally, we highlight 
the implications of our general convergence rate results for image deconvolution. 
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Throughout this section we assume that Assumptions 2.1 and 3.4 hold. Moreover, we shall 



agree upon {(Tfc}^gp^ being a sequence of noise levels such that cr^ — )■ 0"*" and that for A; G N 



there are € (0, 1) and N^. G {Nq, A^'q + 1, . . .} such that (15) holds 



4.1. Introductory Example: Gaussian Sequence Model. In this section we shall con- 
sider the case where the dictionary <1> = {(pi, (j)2, ■ ■ ■} constitutes an orthonormal basis of 
ran(i^). Evaluation of Equation Q at the elements 4>n hence yields 

yn = On + 0-en, 

where Y{(j)n) = Vn, On = (ATu, an d = e(0n)- We define the MR-statistic Tn by setting 



tNis,r) = s — \J2 log N in Definition 3.1, In other words, we consider the maximum of the 



coefficients w.r.t to the dictionary that is 

Tn{v) = max </.„)| - v/21og A. (21) 

l<n<A' 

Since {(^i,(/>2, . . .} are linearly independent and normalised, it follows that the random vari- 
ables £1,62, •• • are independent and standard normally distributed. This implies that Tjqie) 
is bounded almost surely. 



In what follows, we will apply Theorems 3.6 and 3.8 to the present case. To this end, we 



observe that for a > and A G N it follows that 

-a inf AjvdlcAnll) = cTV21ogA. 

l<n<A'' 

With the above preparations, we are able to reformulate the consistency result in Theorem 



Corollary 4.1. Let G C/ be a J- minimising solution of ([T]) where g G span$. Moreover, 
assume that o"^ max(log A^, — log a^) — ?■ 0. Then, the SMRE = Ufq^{ak) almost surely 



satisfies (17) and 



In order to apply the convergence rate resu lt in Theorem |3.8[ Assumption |3.7| has to be 



verified. We set hn,N = (p^ </'n) in Assumption 3.7 Note that the expression err at (p) denotes 
the approximation error of the A^-th partial Fourier-series w.r.t. Thus, Assumption 3.7 is 
linked to absolute summability of the Fourier-coefficients w.r.t. the basis i.e. 

oo 

5^|(p^0n)| < 00 (22) 

n=l 

The Bernstein- Stechkin criterion is a classical method for testing for absolute summability. 
We present a version suitable for our purpose in the following 



Proposition 4.2. Let G V . Then, ([22]) is satisfied if Y1'n=i^^^n{p^) / \^ < oo. 



Proof. The classical version of the Bernstein-Stechkin Theorem [see e.g. \57, Thm. 7.4] states 
that for each / G L^([0, 1]) and each ON-basis v = {vi,V2, ■ ■ ■} of L^([0, 1]), the Fourier- 
coefficients of / are absolutely summable, if err7v(p^)/\/A' < oo. Since each separable 
Hilbert space is isometrically isomorphic to L^([0, 1]), the assertion finally follows. □ 

Following the procedure outlined in Remark |3.3| (Section [3|) we define 

Afc := inf {a G N : erriv(p^) < and rjk := cTfcV21ogAfc. (23) 
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Corollary 4.3. Let g € V he attainable and £ U he a J-minimising solution of ([!]) that 
satisfies the source condition with a source element such that the condition in Proposition 



4.2 holds. Moreover, let Nf^ and rjk be defined as in (23). If 

'(0,1) 



for a constant /t > 0, then the SMRE -Ufc = UN^{ak) almost surely satisfies (20). 



The problem of characterising those elements that satisfy the assumption of Propo 



sition 4.2 is a classical issue in Fourier- analysis and approximation theory. Sufficient condition 
are usually formalised by characterising the decay properties of the Fourier-coefficients. In 
a function space setting, this leads to particular smoothness classes of functions and in the 
general situation can be given in terms of Sobolev ellipsoids: for constants /3, Q > we define 
Q{f3,Q) as the infinite-dimensional ellipsoid 



e(/3,Q) 



neN 



(24) 



The Sobolev class W{f3, Q) C V is then defined to consists of allv G V such that {{v, 0n)}„gN 
@{/3,Q) [see EH Sec. 1.10.1]. For v G W{I3,Q) we have that Proposition |42] is applicable if 
(3 > 1/2. 



1 1 1 2 

''u\\ and let K he a compact operator with singular 



Example 4.4. Assume that J{u) — ^ 
value decomposition (SVD) {{ipn, 4'n, Sn)}^^^^- {V'ningfsj is an orthonormal basis (ONB) of 
ker(Jr)-'-, {(/>n}„gN is ONB of ian{K) and the singular values are positive and 

Sr, — )• as n — )• oo. Moreover 



and K*(j)n = Snipn, 



(25) 



for all n G N. For N G N and a G (0, 1] it turns out (e.g. by applying the method of 
Lagrangian multipliers) that the SMRE U]\f{a) with T/v as in (21) is a shrinkage estimator 
given by 

N 

un{c 



(a) = 2^ s„ y„ 1 — Vn- 

n=i ^ ^ + 



We note that UN{a) is a particular instance of a soft thresholding estimator. 

Now, let u"^ G U he a mini mum- norm solution of ([T]) that satisfies the source condition 
K*p'' = v} (cf. Example A.l) with source element p"^ G W{I3,Q) for Q > and /3 > 1/2. 
Then, errAr(pt) < QN'^ and it follows from ^ that 



Q_ 



[ — ] and % ~ cTfe\/-log cifc. 
If cj/fc has polynomial decay, we can choose a constan t k > such tha t = exp(— (kt/^,/ 



at is summable and it follows from Corollary 4.3 and Example 



A.l 



that 



lim sup 



1 



'"'^""ir' / 5 

k-^oo o-kV-^ogak 
This corresponds to the choice 7^ = Ok^ — log Ok in [6]. 



< cx) a.s. 
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As mentioned above, sufficient conditions for the Bernstein-Stechkin criterion (cf. Propo- 
sition 4.2) in a function space setting are usually formalised in characterising smoothness 
properties. The following example shows how this applies to Holder-continuity. 

Example 4.5. Let V = L^er([0, 1]) be the Hilbert space of all square-integrable and periodic 
functions on the unit interval. Moreover, we assume that ran(i^) = L^([0, 1]) and consider 
the trigonometric basis 

<p2n = cos(n7rx) and 4>2n+i = sin(n7rx). 

Assume that G np{[0,l]) D V (cf. Definition [R4|) with /3 > 1/2. Then we have that 
eTTN{p^) < QN^^ log N for a suitable constant Q > and therefore it follows from Proposition 



42jthat (|22|) holds. 

Hence, if is a J-minimising solution of ([T]) that satisfies the source condition (10) with 
source element G "^^([0, 1]) and if the sequences Ni^,rik and are chosen as in Example 
4.4, then {i^ = U]^^{ak) almost surely satisfy (pO|). 



Remark 4.1. i) The assertions of Example 4.5 still hold if the trigonometric basis is replaced 
by any other orthonormal basis {</>n}„gN of ran(X) such that the Bernstein-Stechkin crite- 
rion in Proposition |4.2| is satisfied. This holds for example for a vast class of orthonormal 
wavelet bases of L^([0, 1]) as studied in [21j. 
ii) For the trigonometric basis in Example 4.5, the Bernstein-Stechkin criterion 4.2 can be 
replaced by the requirement that G '^/^([0, 1]) for any /3 > is additionally of bounded 
variation [see 169^ Vol.1 Thm.3.6]. 



4.2. Non-orthogonal Models. In contrast to Section |4T| where we considered orthonormal 
dictionaries, we will now focus on more general (non-orthonormal) systems. In other words, 
we consider sequences 

$ = </.2,...}C^^i^(K)\{0} 

and assume that ||(/>n|| < 1 for all n E N. Moreover, we will make use of the MR-statistic Tjsi 
(cf. Definition 3.1) defined by 

tN{s,r) =s- V-27logr, (s, r) G M+ x (0, 1] (26) 

where 7 > is some constant. As outlined in Example 3.2, one verifies that tN{s,r) satisfies 
the assumptions of Definition 3.1 In particular, we find that Xn{^) = ~V~'^1 logr > —00 
for all r G (0,1]. The parameter 7 that appears in (26) has to be chosen appropriately in 
dependence on $ in order to guarantee that the MR-statistic T^is) is bounded almost surely. 
A sufficient condition on 7 has for example been given in [33', Thm 7.1] 



Proposition 4.6. If there exists constants A,B > such that 



D{u6,{^ G $ 



< 6}) < 



for all u,6 e (0, 1] 



(27) 



then almost surely sup^gj^ TAr(e) < 00. Here D denotes the capacity number (cf. Definition 



B.6). 



Corollary 4.7. Let G C/ be a J-minimising solution of ([T]) where g G span$ and 7 > be 
chosen such that the assumption of Proposition 4.6 is satisfied. Moreover, assume that 

(t| min( min 



l<n<Nk 



log 



-logOfc) 



0. 



Then, the SMRE Uk = Uk{oik) almost surely satisfies (17). 



16 



SHAPE CONSTRAINED REGULARISATION 



In order to apply the convergence rate results in Theorem 3.8, it is necessary that a J- 
minimising solution v) of ([T]) satisfies the source condition (10) with a source element that 
can be approximated by t he d ictionary $ sufficiently well (cf. Assumption 3.7). We illustrate 
the assertion of Theorem 3.8 when U = V = L^([0, l]*^) {d > 1) and w hen <l> consists of a 
countable selection of indicator functions on cubes in [0, 1]'^ (cf. Example 1.1). 

First, we shall examine when Proposition 4.6 holds. To this end, we will focus first on 
the (uncountable) collection of indicator functions on cubes in [0, 1]'^. Then, according 



to Proposition |B.8[ the assumptions of Proposition 4.6 are satisfied for <I> = <I>s and 7 = d. 



Particularly, it follows that the assertion of Proposition 4.6 also holds for arbitrary (countable) 
sub-systems $ C <^s; that is the statistic 



TN{e) 



max 

l<n<Af 



k(XQ„)l - V-dloglQnl where XQ„ e ^ 



stays bounded a.s. as — )■ 00 (note here, that ||xQ„ll = v IQD- 

Next, we study Assumption |3 . 7| in the present setting. Let V = {Qi, Q2, . . •} be a countable 
system of cubes and set ^ = {xq„ ■ n £ N}. We shall assume that V satisfies the conditions 
(where Q = [0, 1]"' and Ai = Qi for i S N). Let {n^j^gj^ and {(^/l^gp^ be defined 



B.5 



of Lemma 
accordingly. Moreover, we define 



inf 

ni<j<ni+i 



inf 

ni<j<ni+i 



\XQj 



where we assume that {ezj^gjs^ is non-increasing. This means that we partition the set [0, 1]"^ 
into disjoint sub-cubes {-^i}n;<i<"i+i ^^^^ i^^ scale) is bounded by [ez,^;]. It is more 

natural to formulate convergence rate results in terms of the total number m of used scales 

N{m) 



rather than in the total number of sub-cubes N 



rir. 



2+1. Following Remark 3.3 and 
applying Lemma B.5 we therefore define for a given continuous function p"^ : [0, l]'^ — >■ M 



nik 



inf <^ m G N 



m + 1 



E 



u=0 



< 



-2crfclogem| and rjk := o-fci/-2 loge^fc- 



(28) 



Here a;(-,pT) denotes the modulus of continuity of p^ (cf. Definition B.4). With this and the 



general convergence rate result in Theorem 3.8 we immediately obtain 



Corollary 4.8. Let e L^{[0,lf) be a J -minimising solution of ([T]) where g G span<I> and 
that satisfies the source condition (10) with source element p^ G C([0, l]'^). Moreover, let nik 
and rjk be defined as in ( 28 ) . If 



lim r]k 

k—^oo 



and '■= e 



^(0,1) 



for a constant k > 0, then the SMRE Uf^ = UN{mk)i'^k) almost surely satisfy (20) 



Example 4.9. We consider the system of all dyadic partitions V = V2 of [0, 1]*^ as in Example 
|B.9| In particular, we note that the assumptions of Lemma B.5 are fulfilled with ni = 

Qip"^) > such that 



(2rf('+i) - l)/(2'^ - 1), 61 = and = 2-''^/2. 

If p^ G 'W/3([0, 1]"') for < /? < 1, then there exists a constant Q 
uj{6i,p^) < 



/3 



This shows that 
m + 1 



Em 
V- 



'('^..Pt) 



< Q2rf^(22/5 _ 1) 



m -|- 1 



22/3(m+l) _ 1 
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for m G N large enough. From this and (|28l) it is easy to see, that 



1 , (QH\2"'-l) \ ^ r-. 

2/3 log 2 V dlog2af. J 



'k 

Thus, if there exists a constant k > such that 



_ / ■•■Ik 2 



is summable and if the true J-minimising solution li^ satisfies the source condition ( 10 ) with 
source element G '^/^([0, 1]), then it follows that the SMRE Uk = UN{mt)i^k) almost surely 
satisfy (20) with 77^ = Ok\J — log (J^. 



4.3. TV-Regularisation. In this section we will study SMR-estimation for the special case 
where J denotes the total-variation semi-norm of measurable, bi-variate functions. This has 
a particular appeal for linear inverse problems arising in imaging (such as deconvolution) , 
since discontinuities along curves (edges, that is) are not smoothed by minimising J. 

Over the last years regularisation of (inverse) regression problems in a single space di- 
mension invoking the total-variation semi-norm has been studied intensively and efficient 
numerical methods, such as the taut-string algorithm in [26|, have been proposed (see e.g. 
126 ^1271153] and references therein). In two or more space dimensions, however, the situation is 
much more involved and a generalisation is difficult [see e.g. 43j. We study here an extension 
to the case of space dimension 2 as well as to deconvolution by applying the results in Section 
[3] to the following setting: 

We assume henceforth that C is an open and bounded domain with Lipschitz- 
boundary 50 and outer unit normal i'. Moreover, we set U = h2 and define BV(il) to 
be the collection of u € U whose derivative Dm (in the sense of distributions) is a signed 
M^-valued Radon-measure with finite total- variation iDul, that is 



|Dm| (O) = sup / div (^) u dx < 00. 

^/)eC|J(n,M2) Jq 

We note that the norm HuHgy := ||u||li + |Du| (0) turns BV(0) into a Banach-space and that 
with this norm BV(ri) is continuously embedded into L2. The embedding is even compact if 
L2 is replaced by hp with 7? < 2 (a proof of these embedding results can be found in [1] Thm. 
2.5]. For an exhaustive treatment of BV(r2) see |36| I68j). With this, we define 



J{u) 



\Bu\{n) if'uGBV(O) 
+CXD else. 



The functional J is convex and proper and, as it was shown e.g. in [H Thm. 2.3], J is lower 
semi-continuous on L2. This shows, that J satisfies Assumption |2.1| (ii). Next, we examine 



Assumption 3.4 



Lemma 4.10. If there exists no G N such that \{K1, cpn^)] > then Assumption 3.4 holds. 
Here, 1 denotes the constant 1-function on 0. 

Proof. Let c G M and {"UfclfcgN ^('^)- Then in particular it follows that sup,^.gj^ J{'^k„) < c < 
00 and thus we find with Poincare's inequality [see ESI Thm. 5.11.1] 

\\uk - Uk\\i2 < ciJ{uk) < C2 < 00 
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for suitable constants ci, C2 E M, where Uk = \^\ ^ Jq Ukir) dr. Now choose (j) £ {^i? • • • > ^Pn} 
and observe that 



^,K1)\ 



I ((/), Kufc) I ^ !((/>, Jr(nfc-nfc)) I ^ K'/', Kuk) \ 



^ WT^n II - II , \{KUk,(l)n)\ . II j^ii 

< A kifc — Ufc t2+ max r-^ — n < -f^ C2 + c. 



Let 1 < no < be such that i 
and we find 



'no I 



= :7>0. Then, |n„| < (||i^||c2 + c) 



^™ IlL 



'-nllL 



2 + ll^inllL^) < C2 + C3 



/7 =: C3 



□ 



We note that the assumptions in Lemma 4.10 already imply the weak compactness of the 
sets (11) and thus guarantee existence of a J- minimising solution of ([T]) . From the above 
cited embedding properties of the space BV(r2) it is easy to derive an improved version of the 



consistency result in Theorem 3.6 



Corollary 4.11. Let g G span<I> and assume that G BV(il) is the unique J-minimising 
solution of ([T]). Moreover, let {afcl^^gj^ and {A^'fcj^gpj be as in Theorem 3.6 and define = 
UNkic^k)- Then, additionally to the assertions in Theorem |3.6| we have that 



lim 

fc— >-CXD 



Uk 







a.s. 



LP 



3.6 



it follows that {ufcl^pf^ is bounded a.s. in L2 and that each weak 



for every 1 < p < 2. 
Proof. From Theorem 

cluster point is a J-minimising solution of ([T]). Since we assumed that is the unique J- 
minimising solution of ([T]), it follows that Uk in L2 a.s. and therefore also in hp for each 

1 < p < 2. 

Since is assumed to be bounded, it follows that L2 is continuously embedded into LI. 
Thus, it follows from Theorem 3.6 that almost surely sup^gj^ II^AiIIbv ^ From the compact 
embedding BV(0) ^ hp for 1 < p < 2, it hence follows that {ufcl^gN compact in hp. Thus, 
the assertion follows, since weak and strong limits coincide. □ 

Unfortunately, the above embedding technique can not be used in order to improve the 
convergence rate result in Theorem 3.8 to strong L^-convergence and thus we have to settle 

Therefore, we aim for an interpretation of convergence 
. We summarise: 



for the general results in Theorem 3.8 



w.r.t. the Bregman-divergence in (20 



< 1 



Lemma 4.12. One has ^ S dJ{u) if and only if there exists z G L°°(0,M^) with \\z\^^ _ 
such that (z, z^) = on 

div (z) = ^ and / dx = \Du\ {Vl) . 
Jn 

If e G dJ{u), then D^j{v,u) = \Dv\ (n) - f^C^dx. 

Proof. The representation Dj{v,u) = \h>v\ (ft) — J^^^vdx directly follows from the definition 
of the Bregman-divergence and the first assertion of the Lemma. The latter was proved e.g. 
in [371 Thm. 4.4.2]. □ 
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Remark 4.2. The result in Lemma 4.12 allows a geometrical interpretation of the Bregman- 
divergence w.r.t. the functional J. As it was worked out in [13, Sec. 5.1], one can show 
that 

oS(„.,.)=/(l-cosW.,„,.)))d|D„|W 

Jn 

where 'y{v, u, x) denotes the angle between the unit normals of the sub-levelsets of u and v at 
the point x £ fl. 

We recall that a function u G BV(i7) satisfies the source condition, if there exists G 
ran(i^r*) such that ^ € dJ{u). It is important to note, that in many applications the elements 
in ran(if*) exhibit high regularity such as continuity or smoothness. Thus it is of particular 
interest, if such regular elements in dJ{u) exist. If u is itself a smooth function, application 
of Green's Formula and Lemma 4.12| yield [see also W2}, Lem.3.71]. 

Lemma 4.13. Let u G C^{Q) and set E[u] = {x £ Q. : Vu{x) ^ 0}. Assume that there 
exists z G Co(il,M^) with \z\ < 1 and 

Vu{x) 



z{x) 



\Vu{x)\ 



for X G E[u]. 



Then, ^ := div (z) G dJ{u). 

In many applications (such as imaging) the true solution u G BV(il) is not continuous, as 
e.g. if u is the indicator function of a smooth set D G ^. The following examples shows that 
in this case we still have dJ{u) n C^($7) 7^ 0. For the analytical details we refer to [62, Ex. 
3.74] 

Example AAA. Assume that D dVtis a closed and bounded set with C°°-boundary dD and 
set u = xd- The outward unit-normal n oi D then can be extended to a compactly supported 
(7°°. vector field z with \z\ < 1. Independent of the choice of the extension, we then have 
C := div(z) G dJ{u) and ^ G C^{n). 

Example 4.15. We consider = [0, 1]^ and V = L2. Moreover, we assume that V2 denotes 



the set of all dyadic partitions of Q (cf. Example B.9) and that $ is the collection of indicator 
functions w.r.t. elements in 7^2- 

For a function A; : — t- M, we consider the convolution operator on U defined by 



{Ku){x) 



k{x — y)u{y) dx for x £ il. 



where u denotes the extension of u on by zero-padding. Assume further that is the 
indicator function on a clos ed an d bounded set D C il, with C°^-boundary dD and that 
is as in Example |4.14[ If the Fourier-transform J-^{k) =: k oi k is non-zero a.e. in 



and if there exists /3 G (1,2] such that 

(l + |.|2)-/3/2 I'l/^^ eL2(M2) 

then Assumption 



and supp p' := J" M,^/A; C r2 



3.7 



is satisfied. To be more precise, we have that g %p_i{yt) [see[21 Thm. 
7.63] and if there exists a constant k > such that := af!^ is summable it follows from 
Example 4.9 and Lemma |4.12 that 



lim sup 



\Duk\ {9) - J^^Uk dx 



lim sup 



JqI- cos(7(nA;, u^ x)) d \I)uk\ (x) 
o-fcV- log O-fc 



< 00 a.s. 
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for the SMRE -Ufc = UNkictk) (where is as in Example 4.9) 
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Appendix A. Source-condition and Bregman-divergence: Some examples 
The notions of source-condition and Bregman-divergence are very common in the field 



of inverse problems. We will summarise the meaning of the source-condition (10) and the 



Bregman-divergence for some frequently used regularisation functionals J. We also note that 



in Section 4.3 the more complex example where J is the total-variation of a measurable 
function on a domain Q is studied in more detail. 

Example A.l. Let J{u) = ^ \\u\\'^. Then, J is differentiable on U and for all u £ U the set 
dJ{u) consists of the single element {u}. We have that J'{v){w) = {v,w) and consequently 

f 1 2 

Dj{v, u) = Dj{v, u) = - \\v — u\\ for ^ = u £ dJ{u). 

Moreover, the source condition ( |10[ ) can be rewritten to 

£ i:an{K*). 

Since Tan{K*) = ran{K* K)^^'^ , this shows that the source condition ( [To| ) corresponds to the 
Holder-source condition £ ian{K* K)^ for /3 = 1/2 [see [35]. In [71 Sec. 5.3], the Holder- 
source condition w.r.t. a smoothing operator K on Hilbert-scales has been discussed. To 
be more precise, assume that {H^} is a scale of Hilbert spaces and that K is a-times 
smoothing, i.e. K : H 



,t 



is continuous with continuous inverse. Then the condition 



{K*K)f^p^ implies that u"^ £ H2ai3- A prototype for Hilbert scales are Sobolev spaces. 
Here the index ^ corresponds to the Sobolev index. 

Example A. 2. Let {ipn]n&i ^ ONB of U and define 

In applications this functional promotes sparse solutions, that is solutions that have only 
few non-zero coefficients w.r.t the basis {ipn}nm- argued in [l2l Rem. 17] the 

source-condition (10) holds if and only if there exist constants a, 6,7 > such that < a 

and 



\u' 



> 



for all u £ U such that ||n||;^ < a and ||/ir(u- 
set J C N the restriction of K to the set {ipn 
/3i , /32 > such that 



< b. If additionally for every finite 



n £ J} is injective, there exist constants 



u 



u 



for all u£U (see the proof of US Thm. 15] and [Ml Thm 6.4]). 
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Example A. 3. Assume that {/ = L2 for an open and bounded set 0, C W" with Lipschitz 
boundary dQ and outer unit-normal u and let H^(r2) denote the Sobolev-space of order /3 £ M. 
We define 



J{u) 



+00 else. 



Then [see O pp.63], the set D{dJ) consists of all elements u E H^(r2) that have vanishing 
normal derivative (Wu^u) on dVt and if n € D(dJ), then dJ{u) = {— Au}. With this, it 
follows that J'{v){w) = (Vv^Vw) and 

Dj{v,u) = D^j{v,u) = ^ \\V{v - n)f for ^ = -Au G dJ{u). 



Moreover, u"^ satisfies the source condition ( |10[ ) with source element p"^ £ V if and only if 

-{K*p^){x) = Au^x) inn 

Vu^ -u = n'^-^-a.e. on dQ 
(here Ti^^^ stands for the (n — l)-dimensional Hausdorff- measure on dQ). 



Example A. 4. Let U be as in Example A. 3 and define the negentropy by 
J(n) 



— JqU log u dx if u > a.e. and u log n G LI 
+00 else. 



Then [seeUl Chap. 2 Prop 2.7], the set D{dJ) consists of all non-negative functions in Loo 
that are bounded away from zero. One has J'{v){w) = (1 + logf , w) and if n G D{dJ), then 
dJ{u) = {1 + logu}. After some re-arrangements we find 

Dj{v, u) = Dj{v, u) = (^v log ^— ^ — V + dx, 

that is, the Bregman-divergence coincides in this particular case with the Kullback-Leiber- 
divergence. It was proved in [10^ Lem. 2.2] that 

h-u\\li < ||i;||li + ^ ||w||li^ Dj{v,u). 

In other words, Bregman-consistency (or convergence rates) w.r.t. the negentropy yields 
strong consistency (convergence rates) in LI. Finally, we note that G D(dJ) satisfies the 
source condition ( |10[ ) with source element p"^ € V if and only if 

g(i^V)W-i = ^t(a.) for a.e. x e n. 
Appendix B. Proofs 

B.l. Proofs of the main results. In this section the proofs of the main results, that is 
existence, consistency and convergence rates for SMRE, are collected. We start with a basic 



estimate for the quantile function gAr(-) of the MR-statistic as defined in (14). We shall 



assume that Assumptions 2.1 and |3.4| hold. 

Lemma B.l. Assume that is an MR-statistic and let a G (0, 1) and G N. Then, 

qN{a) < med(rjv(e)) + LV-21og(2a). 
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Proof. First, we introduce the function f{xi, . . . ,xn) = maxi<„<7v tAf(2;„, ||(/'ra||). Then, / is 
Lipschitz continuous with ||/||Lip ^ L. Next, define for 1 < n < the random variables 
En '■= £{(t>n)- Then, (ei, . . . ^e^) ~ A/'(0, S) for a symmetric and positive matrix S G M^^^ 
with IISII2 = 1. Hence 

TN{e) = max t;v(e(C), Un\\) = /(^i, • • • , ^Tv) = /(S^/^z), 

l<n<A' 

where Z is an A^-dimensional random vector with independent standard normal compo- 
nents. In other words, the statistic T]^{e) can be written as the image of Z under the 
Lipschitz function /(S^/^-). Applying Borel's inequality [see |65l Lem. A. 2. 2] we find that 
2P(T7v(e) -med(rAr(e)) > Ltj) < exp (-(r/V2)) for ah r] £ R. Now let a £ (0,1), choose 
q < g'Ar(a) and set rj = (q — med(TAr(e)))/L. Then, P(T/v(e) < q) < I — a and hence 



Q = l-(l-a)<l-P (Tiv(e) <q) = F (r7v(e) > g) < 




med(r7v(e))' 



2> 



2 \ 2 V L 
Rearranging the above inequality yields 

q < med(TAr(e)) + L v^-2 log(2a), for all q < gAf(a)- 
The assertion follows for q ^ q]\f{a). □ 



We proceed with the proof of the existence result in Theorem 3.5, To this end we use 
a standard compactness argument from convex optimisation. For the sake of completeness, 
however, we will present the proof. 



Proof of Theorem 3.5. Let > Nq and y £ V he arbitrary. Due tu Assumption 2.1 (ii). 



-D(J) C U is dense and hence there exists for all given 6 > an element uq G D{J) such that 
Ili^Tuo — y\\ < 5, where y denotes the orthonormal projection of y onto Tan{K). Since (/>„ G 
ran(i^) and \\4>n\\ = 1 for all n G N, this implies that \{Kuo — y,(j)n)\ = \{Kuo — y,</'n)l ^ ^ 
for all n G N. 

Now let fj > and a G (0, 1). Since Tjv is an MR-statistic (cf. Definition |3.1[ ) we find that 
t7v(0,r) < for all r G (0,1]. Thus, according to according to the reasoning above, there 
exists uq G D{J) such that for 1 < n < 

La~^ \yn - {Kuo,(t>D\ < 9iv(a) - max,^Af(ll</'n||), (29) 

l<n<A' 



if the right-hand side is positive. To see this, assume that (?iv(a) < maxi<„<jv Aat^ 
Since for 1 < n < we have that tAr(|e((/)* )| , \\4>n\\) > ■^A''(ll'An||) almost surely according to 



(13), it then follows that 



P (7V(e) > qNia)) > P TN{e) > max Ajv 

l<n<N 



This is a contradiction to the definition of (j'Ar(a) in (14) and thus uq G D{J) as in (29) can 
be chosen. Since s 1— )• tfyf{s,r) is Lipschitz-continuous with constant L and increasing for all 
r G (0,1], we find tN{(^-^ \yn - {Kuo,(^l)\ ,Un\\) < tN{0,Un\\) + La-^ IVn - {Kuq, c^Dl < 
qNiot) for 1 < n < A^. In other words, there exists at least one element uq G D{J) such that 

UoeS:=\ueU : max tN{cr~^ \yn - {Ku,(l)l)\ ,\\<pn\\) < qN{a) 

l<n<N 
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Now, choose a sequence {ufcl^jgp^ C S such that J{uk) — )• mf^gs J(n). This shows that 
supj,gjsj J(Mfc) =: a < oo. Moreover, we find from (13), that there exist constants ci,C2 > 
such that for all 1 < n < 



{KUk,4)n)\ +C2tN{\yn - {KUk,(t>n)\ > \\4>n\\) 

< tN{cr~^ IVn - {KUk,(t)n)\ ' \\<t^n\ 



< qN{a) 



Together with (12), this shows cia ^ |y„ — {Kuk,(pn)\ + C2AAr(||0„||) < gAr(a). Rearranging 
the inequality above yields 



max \{KukA*n)\ < max J 2/" I + 

l<ra<Af l<n<A' Ci 



qN{c()-C2 inf XN{\\4>n\ 

l<n<N 



-: b < oo. 



Summarising, we find that Uk G A(a + b) for all € N, as a consequence of which we can 
drop a weakly convergent sub-sequence (indexed by p{k) say) with weak limit u. Since we 
assumed that ti\f{-,r) is convex for all r G (0,1], it follows that the admissible region S is 
convex and closed and therefore weakly closed. This shows that u £ S. Moreover, the weak 
lower semi-continuity of J (cf. Assumption |2.1| (ii)) implies 



□ 



J (it) < liminf J{up(k)) = inf J{u) 



and the assertion follows with UN(a) 



In order to prove Bregman-consistency of SMR-estimation in Theorem 3.6 , we first establish 
a basic estimate for the data error. 

Lemma B.2. Let N > Nq and a G (0, 1). Moreover, assume that is a solution of ([l| and 
that UNia) is an SMRE. Then, for 1 < n < iV 



ClO" 



(ku^ - KuNia), (Pi) < TN{e) - 2c2Ajv(||(/'n||) + med(riv(e)) + Ly^-2log{2a). 

]\)<QN{a) 



Proof. From Definition 3.3 it follows that t^la ^ \ (^Ku^ — Ku]\[{a) + ere, ) | , 
for 1 < n < A^. The convexity of tjv hence implies that 



< I {tNi<J-' \{Y - Kmia),cl>l)\ , UnW) + tjv(k(C)| 



< ^{qj^ia)+TN{e)). 



By setting v = {2a) ^ \ {Ku'< - K-UAr(a), </>* ) | and 
that 



2a 



Ku^ — Kuj\f{a), (j)* 



+ C2tN 



Kv) — Kuj\f{a), 4>. 



in (13), the above estimate shows 

QNia) + Tjyie) 



< 



Since tN{v,r) > AAr(r) for all v G and r G (0, 1] (cf. (12)) this implies for 1 < n < 



cia 



(Kv) - KuN{a),(t>n) < qN{a) + TN{e) - 2c2XN{\\<t>r^ 



Finally, the assertion follows from Lemma B.l 



□ 



With these preparations, we are now able to prove Bregman-consistency. 
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Proof of Theorem 3.6. By the definition of tlie SMRE -Ufc = ui\j^{ak), it follows that 

P (j(nfc) > J(nt)) < P (TN,{a^\Y - Ku^)) > qN^k)) = P(r^,(e) > <Ziv,.(afc)) < Ofc 

for all k £ N. Since Yl^=i < oo, it follows from the Borel-Cantelli Lemma [see 1631 P 255] 
that P [J{uk) > J{u^) i-O-) < IP i^Nki^) > iNki^k) i-o.) = 0, or in other words 

(30) 



3A;o G N : J{uk) < J{v}) for all k > koj = 1. 

In particular, it follows that sup^gj^ Jiuk) =: a < oo a.s. 

Next, we note that supjycj^ r/v(e) < oo a.s. implies that sup^gj^ med(Tjv(e)) < oo. Hence, 



B.2 



it follows from Lemma 
surely, as A; — )• oo which proves (18) 



and (16) that maxi<n<Nk 



Kuk, 



supfcgjij maxi<„<Aro | =: 6 < oo a.s. 



0{(k) almost 



In particular, (18) and the fact hat A^^ > A'^o imply 



is sequentially weakly precompact according to Assumption 3.4 
indexed by p{k) with weak limit u £ U. Since N/^. 
(flel) that 



Summarising, we find that Uk G A(a + b) which 
. Choose a sub-sequence 



oo as A; — )■ oo it follows from (18) and 



\{g-Ku, 



lim 

fc— >oo 



Ku"^ — Ku 



for all n e N. 



Since we assumed that g G span$ this shows that Ku = g. Furthermore, according to (30) 
there exists (almost surely) an index ko such that J{up[k)) does not exceed J('u^) for all k > kg. 
Together with the weak lower semi-continuity of J this shows J{u) < liminffc_j.oo J{up(k)) ^ 
lim supfc_i.(^ J(np(fc)) < J{u^). Since is a J-minimising solution of ([T]) we conclude that 
the same holds for u and that J{u) = J{u^) = limfc_>oo J{up{k)- particular, for each sub- 
sequence {J{uk)}k(^-M there exists a further sub-sequence that converges to J(u"I). This already 
shows that limjt_>oo J{uk) = J{u^) a-s. 

We next prove that Dj{u\ Uk) — )• 0. To this end, recall that there almost surely exists an 
index ko such that for k > ko one has TjVj.(e) < QN^ictk)- In order to exploit strong duality 
arguments, however, we have to make sure that the interior of the admissible region is non- 
empty (Slater's constraint qualification). But since we assumed that s i— )■ tiy{s,r) is (strictly) 
increasing for each fixed r G (0, 1] it follows that P {tNk{\^{4'n)\ i W^nW) = QN^ioik)) = for all 
n G N and thus 



P(3A;o : TN,{e) < QN^k) for all k > ko) 
By introducing the functional 



(31) 



Gk{v) 




else, 



we can rewrite ([3| into Uk G argmin^gj; J(u) + Gk{Ku). From (31) it follows that lies in 
the interior of the admissible set of the convex problem ([s]). In other words, the functionals 
Gk are continuous at Ku"^ for k large enough. Therefore we can apply [34i. Chap. II Prop. 
4.1] (cf. also Chapter II, Remark 4.2 therein) and choose an element G F such that 
K*(^k S dJ{uk) and —E,k £ dGk{Kuk)- The second inclusion and the definition of the sub- 
gradient show that Gk{Ku) > Gk{uk) — {^k-,Ku — Kuk) = {K*(^k,Uk — u) for all u £ U. In 
particular, satisfies TNf,{a^^{Y — Ku"^)) = T^^ie) < qNf,{oik) and thus Gk{Kv)) = 0. This 
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shows > (K*$^i^,Uk — u^). Since J{u}S) — J{u^) we find 
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< limsupZ)j(u^,'Ufc) < limsupD^ ^''{u\uk 



fc— >oo 



lim sup J(ti^) — J(ufc) 

fc— >oo 



K*^, v) — Uk) < lini sup J(n^) — J{uk) = 0. 



This proves (17). 



□ 



It remains to prove the convergence rate results in Theorem 3.8 To this end additional 
regularity of the true J-minimising solutions of ([T]) has to be taken into account. This is 
formulated in Assumption |3.7[ With this we get the following basic estimate. 



Lemma B.3. Assume that Assumption 3.7 holds and let > No and a G (0, 1). Then 



a 

< — 
ci 



fN{e)-2c2 inf A7v(||0n||) + ^\/-2 log(2a) V 

l<n<N I — ' 



N 



'n,iV 



n=l 



+ PN 



Ku]\f{a) — Kv) 



where T]\f{e) = Tj\f{e) + med(T/v(e)). 
Proof. From Assumption |3.7| we find that 



< 



\n=l 



N 



< 



n=l 



max 

<n<N 



+ PN 

0* , Kun{o) — Kv) 



Ku]s[{a) — Kv) 

Kv]\f{a) — Kv'^ 



+ PN 



From Lemma IB. 21 it follows that 



max 

l<n<N 



, Kviy{a) — Ku^ 
which shows the assertion. 



< 



ci 



fNis)-2c2 inf Aiv(||</'n||)+^V-21og(2a) 

l<n<A' 



□ 



Combination of the auxiliary result in Lemma B.3 with Theorem 3.6 paves the way to the 



proof of Theorem 3.8 



Proof of Theorem 3.8. First, observe that Assumption 3.7 and the definition oirjk imply (16), 
that is, all assumptions in Theorem 3.6 are satisfied. Therefore {vk}k&^ is bounded almost 
surely and due to the continuity of K we find that sup;j.gj^ ||-ft^iifc — i^Ttt ^H < oo a.s. After 
setting B := sup^Ygpj '}2,n=i l^n.A^I) which is finite according to Assumption 
Lemma B.3 and the definition of rjk that 

Bak 



3.7 



it follows from 



K*p\vk - V 



< "-TN^ie) + Cr^k 



ci 



(32) 



for a suitably chosen constant C > 0. Since sup^gj^ T/v(e) < oo almost surely, it follows 
that also sup^ygjij Tat (e) = supjygj^ {Tn^e) + med(r/v(e))) < oo a.s. Combining this with (32) 
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shows 



K*p\uk 



Oivk 



a.s. 



Next, recall from (30) in the proof of Theorem |3.6| that almost surely an index /cq can be 



chosen such that for all k > ko one has J{uk) < J{u^)- This shows that 



Dj ^\uk,u''] 



J{Uk) - J{u 



P',Uk-u 



< 



P',Uk 



Oivk 



for k > kQ. This proves the first estimate in (20). The second estimate follows directly from 
Lemma lEl □ 

B.2. Approximation of continuous functions and entropy estimates. In this section 
we collect some results on the approximation properties and entropy estimates for systems of 
piecewise constant functions defined on a convex and compact set 17 C (c? > 1). We start 
with the following basic 



Definition B.4. Let 17 C 

(i) For a function g : ^1 - 



l'^ be compact and convex. 
M, the modulus of continuity is defined by 



sup \g{s) 

\s-t\^<5 



g{t)\ for(5>0. 



(ii) A function g : Q ^ M is called Holder- continuous with exponent /3 £ (0, 1] if uj{5,g) = 
0{6^). The collection of all functions on Q that are Holder-continuous with exponent /3 
is denoted by 'Hp{^). 

The following lemma provides an error estimate for the approximation of a continuous 
: 17 C M"^ — )• M by piecewise constant functions in terms of the modulus of continuity. 

Lemma B.5. Let 17 C M"' be a compact and convex set and {^i,^2) • • •} be a collection of 
measurable sub-sets of 17. Assume that there exists an increasing sequence {'^il/gpj C N with 
TT-O = such that 

(i) for all -|- 1 < i < J < nij^i one has |Aj n Aj| = 0, 

(ii) and 17 = U . . . U A„,^-^ 



for all / G N. Then, for all continuous g : 17 

}| < ll^lloo and 



there exist coefficients b'^, such that 



sup^ 

meN 



E 



ni+i 



9 



1=0 j=ni+l 

where 5i := max„;<j<„;^^ diam(Aj). 

Proof. Let 5 : 17 — t- M be continuous. For / G N we define 

ni+i 



1=0 j=ni+l 



< 



m + 1 



91 



E 

j=ni+l 



\A 



g{T)dT-xir 



Next, we introduce aim = ("^ '^{^h 9)) I {Y.l^=o^~^{5y: 9)) for m G N and 1 < / < m. Note, 
that aim £ (0, 1) and X]o<Km '^im = 1- With this, we define for < / < m and ni < j < n^+i 
the coefficients = {aim Jj^. gi^r) dr)/ \Aj\. Since we assumed that g is continuous on the 
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compact set il, it follows that 



< Moo aim and hence ^[^g Ei=n,+i 



m G N. Moreover, we have for all s ^ ft that 



< 1151100 for all 



"^aijngiis) - g{s] 



1=0 




bl-^) - dr • XA, (s) • 



yj=ni+l 

After applying Jensen's inequality and keeping in mind that |s — 1| < 5; for s,t G Aj and 
ni < j < TLi^i it follows that 

2 



1=0 




i=n;+l 



-j\ J A 



IgiT) -g{s)\ dr • xaAs) ds 



lit • "L-^- i. p ^ n 

^aim Yj / ijA lai-r) - g{s)f drds 

1=0 j=ni + l -^^J ' ''^3 



Assumptions (i) and (ii) together with the definition of the coefficients aim eventually yield 

2 



n 



^^aimgiis) - g{s) 



1=0 



ds < 



m + 1 



E 



u=Q' 



\5u,g)' 



□ 



For the remainder of this section we collect some results concerning the capacity number 
of (subsystems of) the set of indicator functions on convex and closed sets in [0, 1]"^ with 
d > 1. We first recall the basic definition 

Definition B.6. Let {T,d) be a semi-metric space, T' C T and e > 0. The capacity number 
is defined by 

D{£,T'):= sup ({#r" : d(a, 6) > e for ah a / 5 G T'}) . 



From a practical point of view, it is often more convenient to express (27) in terms of 
the e-covering number N{e, T') of T' which is defined as the smallest number of e-balls in T 
needed to cover T' (the center points need not to be elements of T', though). It is common 
knowledge [see|65l p. 98] that for all e > 

N{e, T) < D{e, T) < N{e/2, T). (33) 

We consider C if ) metric space with the induced L^-metric, i.e. for xp, XQ ^ 

we have 

d{XQ,Xpf = \\XP - Xq\? = / {XQ-Xpfddx = \Ql\P\. 

J [0,1]'* 

The entire set $rf is too large in order to render the test-statistic T/v in (26) finite: it was 
shown in [12j [see also ISTl Chap. 8.4]) that the e-covering number of of all nonempty, 



closed and convex sets contained in the unit ball {x G 



< l} is of the same order as 
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exp(e(^ <^)/2) (^for d > 2) as e — t- 0^. This proves that there cannot exist any constants A, B 



and 7 such that (27) holds with $ = 

For particular classes of convex sets, however, entropy estimates as in ( |27| ) are at hand. 
The collection of indicator functions on d-dimensional rectangles in [0, 1]'* constitutes such 
an example: 

Proposition B.7. There exists a constant A = A{d) > such that 

D{u5, {(PG<^r : U\\ < S}) < Aiud)-"^^ 

for all u,5 e (0,1]. 

Proof. From |65| Thm. 2.6.7] it follows that the e-covering number of can be estimated 
by Ae~'^^^^^^ where V denotes the VC-index of the set of subgraphs {{x,t) : t < (pix)} for 
(/> G ^r- This in turn is equal to the VC-index of the collections of all rectangles in [0, 1]*^ 
which is 2(i+ 1 [see|65l Ex. 2.6.1]. □ 

For certain subsets of better estimates can be derived. We close this section with results 
for the system and <&2 of indicator functions on all squares and dyadic partitions in [0, 1]*^ 
respectively. We skip the proofs, for they are elementary but rather tedious. 

Proposition B.8. There exists a constant A = A{d) > such that 

D{u6,{^e<^s ■■ M < S}) < Au-^^'^+'^'^6-'^, for all u, 6 e {0,1]. 

Proposition B.9. Let d> 2 and consider the system of all dyadic partitions in [0, l]'^, that 
is 

V2:= {q C[0,lf : Q = 2-^{t+[0,lf), A: G N, i = (ii, . . . , id) G N'^} . 

Let <I>2 the set of all indicator functions on elements in V2- Then, there exists a constant 
A = A{d) > such that 

A-^u-'^d-'^ <D{u6,{(pe^2 ■■ U\\ <S}) < Au-'^6-'^, for all u, 5 £ [0,1]. 
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